AITopics | stochastic shortest path problem

Collaborating Authors

stochastic shortest path problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RecursiveReinforcementLearning

Name

Neural Information Processing SystemsFeb-19-2026, 16:00:39 GMT

Recursion is the fundamental paradigm to finitely describe potentially infinite objects.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Solving Constrained Stochastic Shortest Path Problems with Scalarisation

Schmalz, Johannes, Trevizan, Felipe

arXiv.org Artificial IntelligenceAug-26-2025

Constrained Stochastic Shortest Path Problems (CSSPs) model problems with probabilistic effects, where a primary cost is min-imised subject to constraints over secondary costs, e.g., minimise time subject to monetary budget. Current heuristic search algorithms for CSSPs solve a sequence of increasingly larger CSSPs as linear programs until an optimal solution for the original CSSP is found. In this paper, we introduce a novel algorithm CARL, which solves a series of unconstrained Stochastic Shortest Path Problems (SSPs) with efficient heuristic search algorithms. These SSP subproblems are constructed with scalarisations that project the CSSP's vector of primary and secondary costs onto a scalar cost. CARL finds a maximising scalarisation using an optimisation algorithm similar to the subgradient method which, together with the solution to its associated SSP, yields a set of policies that are combined into an optimal policy for the CSSP . Our experiments show that CARL solves 50% more problems than the state-of-the-art on existing benchmarks.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.17446

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Chen, Suei-Wen, Ross, Keith, Youssef, Pierre

arXiv.org Artificial IntelligenceOct-3-2024

Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least $1-\delta$, the algorithm returns an optimal policy after $\tilde{O}(SAK^3\log^3\frac{1}{\delta})$ sampled episodes, where $S$ and $A$ denote the number of states and actions respectively, $K$ is a proxy for episode length, and $\tilde{O}$ hides logarithmic factors and constants depending on the rewards of the environment that are assumed to be known.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.02994

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Efficient Constraint Generation for Stochastic Shortest Path Problems

Schmalz, Johannes, Trevizan, Felipe

arXiv.org Artificial IntelligenceJan-25-2024

Current methods for solving Stochastic Shortest Path Problems (SSPs) find states' costs-to-go by applying Bellman backups, where state-of-the-art methods employ heuristics to select states to back up and prune. A fundamental limitation of these algorithms is their need to compute the cost-to-go for every applicable action during each state backup, leading to unnecessary computation for actions identified as sub-optimal. We present new connections between planning and operations research and, using this framework, we address this issue of unnecessary computation by introducing an efficient version of constraint generation for SSPs. This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go. We also apply our novel technique to iLAO* resulting in a new algorithm, CG-iLAO*. Our experiments show that CG-iLAO* ignores up to 57% of iLAO*'s actions and it solves problems up to 8x and 3x faster than LRTDP and iLAO*.

algorithm, cg-ilao, partial ssp, (14 more...)

arXiv.org Artificial Intelligence

2401.14636

Country: Africa > Togo (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)

Add feedback

On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs

Dong, Zixuan, Wang, Che, Ross, Keith

arXiv.org Artificial IntelligenceSep-6-2022

In reinforcement learning, Monte Carlo algorithms update the Q function by averaging the episodic returns. In the Monte Carlo UCB (MC-UCB) algorithm, the action taken in each state is the action that maximizes the Q function plus a UCB exploration term, which biases the choice of actions to those that have been chosen less frequently. Although there has been significant work on establishing regret bounds for MC-UCB, most of that work has been focused on finite-horizon versions of the problem, for which each episode terminates after a constant number of steps. For such finite-horizon problems, the optimal policy depends both on the current state and the time within the episode. However, for many natural episodic problems, such as games like Go and Chess and robotic tasks, the episode is of random length and the optimal policy is stationary. For such environments, it is an open question whether the Q-function in MC-UCB will converge to the optimal Q function; we conjecture that, unlike Q-learning, it does not converge for all MDPs. We nevertheless show that for a large class of MDPs, which includes stochastic MDPs such as blackjack and deterministic MDPs such as Go, the Q-function in MC-UCB converges almost surely to the optimal Q function. An immediate corollary of this result is that it also converges almost surely for all finite-horizon MDPs. We also provide numerical experiments, providing further insights into MC-UCB.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2209.02864

Country:

North America > United States > New York (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On the convergence of optimistic policy iteration for stochastic shortest path problem

Chen, Yuanlong

arXiv.org Machine LearningAug-29-2018

In this paper, we prove some convergence results of a special case of optimistic policy iteration algorithm for stochastic shortest path problem mentioned in [5] . We consider both Monte Carlo and TD(λ) methods for the policy evaluation step under the condition that termination state will eventually be reached almost surely.

artificial intelligence, machine learning, stochastic shortest path problem, (12 more...)

arXiv.org Machine Learning

1808.08763

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions

Isaza, Alejandro, Szepesvari, Csaba, Bulitko, Vadim, Greiner, Russell

arXiv.org Artificial IntelligenceJun-13-2012

In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions.

abstract action, abstraction, probability, (16 more...)

arXiv.org Artificial Intelligence

1206.3233

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

Short-Sighted Stochastic Shortest Path Problems

Trevizan, Felipe W. (Carnegie Mellon University) | Veloso, Manuela M. (Carnegie Mellon University)

AAAI ConferencesJun-8-2012

Two extreme approaches can be applied to solve a probabilistic planning problem, namely closed loop algorithms and open loop (a.k.a. replanning) algorithms. While closed loop algorithms invest significant computational effort to generate a closed form solution, open loop algorithms compute open form solutions and interact with the environment in order to refine the computed solution. In this paper, we introduce short-sighted Stochastic Shortest Path (SSP), a new model in which solutions computed based on it can be executed for at least t steps as a closed form solution. Using short-sighted SSPs, we present a novel probabilistic planner called Short-sighted Open Loop Planner (SOLP) that bridges the gap between open and closed loop planners by varying the parameter t: as t increases, more actions can be executed without replanning and, for t sufficiently large, a closed form solution is obtained. We prove that SOLP is asymptotically optimal. To the best of our knowledge, SOLP is the unique probabilistic planner that at the same time provides both replanning and optimality guarantees. We empirically compare SOLP with the winners of the previous probabilistic planning competitions and SOLP outperforms all of them in 33.3% of the problems and ties with the best planner in 48.3% of the problems.

goal state, ssipp, ssp, (15 more...)

AAAI Conferences

Twenty-Second International Conference on Automated Planning and Scheduling

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Suboptimality Bounds for Stochastic Shortest Path Problems

Hansen, Eric A.

arXiv.org Artificial IntelligenceFeb-14-2012

We consider how to use the Bellman residual of the dynamic programming operator to compute suboptimality bounds for solutions to stochastic shortest path problems. Such bounds have been previously established only in the special case that "all policies are proper," in which case the dynamic programming operator is known to be a contraction, and have been shown to be easily computable only in the more limited special case of discounting. Under the condition that transition costs are positive, we show that suboptimality bounds can be easily computed even when not all policies are proper. In the general case when there are no restrictions on transition costs, the analysis is more complex. But we present preliminary results that show such bounds are possible.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

1202.3729

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.91)

Add feedback